Schema Extraction for Semi-Structured Data

نویسندگان

  • Mohand-Said Hacid
  • Lina Fatima Soualmia
  • Farouk Toumani
چکیده

The emerging eld of semistructured data leads to new ways of rep resenting data as schemaless or self describing However in many applications data has often some regularity and ignoring the possibly partial structure hinders the abilities to interpret the data and to access them e ciently In this paper we investigate a knowledge based approach for discovering partial implicit structures from semistructured data We show that semistructured data represented in the form of labeled directed graphs can be typed using description logics

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Structured Data Extraction and Schema Knowledge Mining

It is well known that World Wide Web has become a huge information resource. Therefore, it is very important for us to utilize this kind of information effectively. This paper proposes a semi-structured data extraction method to get the useful information embedded in a group of relevant web pages, and store it with OEM(Object Exchange Model). Then, we adopt data mining method to discover schema...

متن کامل

Ontology Driven Web Extraction from Semi-structured and Unstructured Data for B2B Market Analysis

The Market Blended Insight project has the objective of improving the UK business to business marketing performance using the semantic web technologies. In this project, we are implementing an ontology driven web extraction and translation framework to supplement our backend triple store of UK companies, people and geographical information. It deals with both the semi-structured data and the un...

متن کامل

Schema Extraction and Structural Outlier Detection for JSON-based NoSQL Data Stores

Although most NoSQL Data Stores are schema-less, information on the structural properties of the persisted data is nevertheless essential during application development. Otherwise, accessing the data becomes simply impractical. In this paper, we introduce an algorithm for schema extraction that is operating outside of the NoSQL data store. Our method is specifically targeted at semi-structured ...

متن کامل

Une approche matérialisée basée sur les vues pour l'intégration de documents XML. (A view-based approach to the integration of structured and semi-structured data)

Semi-structured data play an increasing role in the development of the Web through the useof XML. However, the management of semi-structured data poses speci c problems because semi-structured data, contrary to classical databases, do not rely on a prede ned schema. The schemaof a document is contained in the document itself and similar documents may be represented bydi erent sc...

متن کامل

Information Extraction with and without Parsing Semi-structured Documents

Information extraction from semi-structured documents comprises contents detection, wrapper generation and schema extraction. The contents detection step corresponds to making training examples in wrapper induction based on machine learning and the schema extraction identifies extracted data types. We formulate the contents detection using the repetitive pattern introduced in this paper. That i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000